fix(protocol): unify plan/message write paths through contributeOperation (#228) by windoliver · Pull Request #237 · windoliver/grove

windoliver · 2026-04-09T09:02:01Z

Summary

Closes #228. Plan and message operations were calling store.put() directly, bypassing PolicyEnforcer, TopologyRouter, after_contribute hooks, and (transitively) the EnforcingContributionStore rate-limit wrapper. This PR routes them through the canonical contributeOperation write path, adds the missing rate-limit wiring on the MCP entry point, replaces the heuristic 60s dedup window with explicit idempotencyKey, and pins everything down with tests.

11 atomic commits, each independently reviewable and tests-clean.

What changed

Architecture

createPlanOperation / updatePlanOperation become sugars over contributeOperation (commit 5ccf818). The literal protocol: plan/message operations bypass policy enforcement and topology routing #228 bug — agent with allowedKinds: ['work'] could create plans freely — is closed.
sendMessage is deleted and replaced with sendMessageAsDiscussion which sugars over discussOperation/contributeOperation (commit 5535629). MCP grove_send_message, the boardroom HTTP route, the CLI inbox send command, and the e2e test all updated.
mcp/serve.ts now wraps the contribution store with EnforcingContributionStore (commit a520eae) — without this, maxContributionsPerAgentPerHour and clock-skew protection were dead config for any MCP-served contribution. Verified end-to-end by a wiring test that asserts the 2nd contribution from the same agent is rejected at the MCP boundary.
Explicit idempotencyKey field on ContributeInput (commit b483123) replaces the previous 60s same-summary heuristic that false-positived on legitimate updates and missed real retries under concurrency. Per-agent namespaced, 5min TTL, 1024-entry LRU. HTTP-style semantics.

Routing rules (locked in during the #228 review)

The kind-based skip logic in contributeOperation now applies these rules uniformly:

kind	handoffs	route event	stop conditions
plan	no	yes	no
ephemeral message (`context.ephemeral=true`)	no	no	no
discussion (regular)	yes	yes	yes
work / review / reproduction / adoption	yes	yes	yes

Plans are coordination metadata (not progress) and ephemeral messages are chat (not coordination). Both opt out of handoff creation and stop-condition evaluation. Ephemeral messages additionally skip the routing event so chat doesn't wake downstream agents. Pinned by 4 routing tests in contribute-routing.test.ts.

Code quality + perf

Typed context schemas (PlanContext, MessageContext) in core/operations/context-schemas.ts (commit 0bf7685) replace 7 force-cast / magic-string sites in plan and messaging operations. Zod-backed parsers + builders, 21 unit tests.
writeAtomic / writeSerial helpers extracted from the 76-line duck-typed cowrite branch in contributeOperation (commit c6bad62).
/tmp/grove-debug.log scaffolding deleted (commit 4dcb5f9) — 8 inline appendFileSync blocks across contribute.ts and serve.ts from a past debugging session.
validateArtifacts parallelized with Promise.all (commit 744e743) — N artifacts now pay 1×rtt to CAS instead of N×rtt.
handoffStore.createMany batch method (commit 8969f26) — NexusHandoffStore.createMany collapses N handoff inserts into one VFS file write. Fixes the N+1 in the serial write path.
updatePlanOperation validates previous.kind === 'plan' (commit 5ccf818) — was previously possible to derive a 'plan update' from a work / review / discussion contribution.

Tests

plan.test.ts created (commit 7e13d7c) — 19 tests covering validation, stats, derives_from chains, kind check, idempotency, protocol: plan/message operations bypass policy enforcement and topology routing #228 regression, and routing rules. Plan operations had ZERO unit tests before this.
messaging.test.ts rewritten (commit 5535629) — 17 tests covering happy path + 11 missing edge cases including the literal protocol: plan/message operations bypass policy enforcement and topology routing #228 regression test.
contribute-routing.test.ts extended with 4 routing rule tests (commit 3a0d5b6) covering the table above.
mcp/server.integration.test.ts wiring test for the EnforcingContributionStore wrap (commit a520eae).

Test results

374 pass / 0 fail across core/operations/, core/policy-enforcer, core/lifecycle.integration, mcp/server.integration. Two pre-existing flaky tests in tui/spawn-manager and environmental tests for AcpxRuntime / TmuxRuntime are unrelated and were verified to fail on main without these changes.

Stats

11 commits
+2126 / -514 lines
18 files changed
6 new test files / files with significantly expanded coverage

Follow-up issues filed

These were deferred from the #228 review per the locked-in plan, with the workaround comments referencing the issue numbers:

perf(messaging): readInbox does full-table scan + JS filter; needs recipient index #234 — readInbox full-table scan (Issue 14): ~4-6h schema migration, doubles PR size, kept out of scope.
perf(policy): investigate moving stop-condition evaluation entirely post-write #235 — investigate moving stop-condition evaluation entirely post-write (Issue 13D): structural perf cleanup, complementary to the tactical 13A skip in this PR.
core/operations: generalize kind-check validation to reviewOperation and reproduceOperation #236 — generalize kind-check to reviewOperation / reproduceOperation (Issue 6 generalization): same class of bug as the plan-only fix in this PR.

Migration / breaking changes

None at the public API level. Internal callers of the deleted sendMessage (server route, CLI, e2e test) updated in the same commit. The idempotencyKey field is opt-in — existing callers see no behavior change.

Test plan

bun test src/core/operations/ — 283 pass
bun test src/core/policy-enforcer.test.ts — 51 pass
bun test src/core/lifecycle.integration.test.ts — 7 pass
bun test src/mcp/server.integration.test.ts — 10 pass (incl. new rate-limit wiring test)
bun run typecheck — clean
Manual: send a message with a contract that has allowedKinds: ['work'], verify it's rejected (regression check for protocol: plan/message operations bypass policy enforcement and topology routing #228)
Manual: hit MCP rate limit by issuing more grove_submit_work calls than the per-hour quota allows

…ute and serve Deletes 8 inline appendFileSync blocks that wrote to a hardcoded /tmp/grove-debug.log path. These were added during a past debugging session and have outlived their purpose. Why: hardcoded path, sync fs in async hot path, require() in ESM, empty catch swallowing errors, DRY violation across both files, debug observability mixed into core write path. Issue 7A in the #228 review (preparation refactor before unifying the write path through contributeOperation).

Pulls the duck-typed atomic-vs-serial cowrite branching out of contributeOperation into three named module-private helpers: - writeAtomic: SQLite cowrite path (one transaction) - writeSerial: separate-await path (in-memory + Nexus VFS handoffs) - writeContributionWithHandoffs: dispatch based on store capabilities The contribution body is now a single call instead of a 76-line inline branch with capability detection scattered across the condition. The helpers are independently testable and the dispatch rule lives in one place. No behavior change. Verified by 249 core/operations tests passing. Issue 8B in the #228 review (preparation refactor before unifying the write path through contributeOperation).

…ecks Replaces the sequential await loop in validateArtifacts with a single Promise.all over the artifact entries. A contribution with N artifacts now pays 1×rtt to the CAS instead of N×rtt — meaningful when CAS is remote (Nexus) and harmless when local. The asymmetry was the smoking gun: validateRelations already uses getMany() for batched lookup. Someone fixed the relation N+1 and forgot the artifact one. Read-only existence checks have no atomicity concerns, so Promise.all is the simplest correct fix. Issue 16B in the #228 review.

…nd message Adds context-schemas.ts with PlanContext and MessageContext as the single source of truth for the magic context keys used by plan and messaging operations. - PlanContext: { plan_title, tasks } - MessageContext: { ephemeral, recipients, message_body } Each schema has a builder (writer side) and a Zod-backed parser (reader side). Plan and messaging operations now use buildPlanContext / buildMessageContext on the write path, and parsePlanContext / parseMessageContext on the read path. The PlanTask type itself moves to context-schemas.ts (re-exported from plan.ts for compatibility). Removes: - 'as unknown as JsonValue' double-cast on plan tasks (2 sites) - '(c.context?.plan_title as string)' force-cast in updatePlan - '(c.context?.recipients as string[])' force-casts in readInbox (3) - '(c.context?.message_body as string)' force-cast in contributionToMessage - Manual ephemeral/recipients shape check in readInbox readInbox now pairs each contribution with its parsed message context once, then filters/sorts/maps over the typed pair. Wrong-shape contributions are dropped exactly once at parse time, removing the ad-hoc 'is this even a message?' check. Includes 21 unit tests covering the schemas and predicates. Issue 5A in the #228 review (preparation for the unified write path).

Replaces the previous 60s same-summary dedup heuristic with an explicit, opt-in idempotency key on ContributeInput. Modeled after the HTTP Idempotency-Key convention used by Stripe, AWS, and most modern HTTP APIs. Why the heuristic was wrong: - store.list({ limit: 20 }) was unbounded across all agents — under multi-agent load, an agent's own recent contributions could be pushed out of the window by other agents' writes, missing real retries. - summary + agent + kind false-positived on legitimate re-edits (e.g., updatePlan called twice with the same title would silently drop the second call). - The whole thing was a guess at intent. Idempotency keys make the contract explicit. Behavior: - Opt-in: callers that don't pass idempotencyKey get no dedup. - Per-agent namespace: keys are scoped by agent.role (or agentId fallback). Two agents can use the same key without colliding. - 5-minute TTL with LRU eviction at 1024 entries. Per-process cache, not shared across grove instances. - The full ContributeResult is cached and returned on hit, including routing/handoff/policy fields. Includes 5 new tests covering: round-trip cache hit, distinct keys, cross-agent isolation, role-vs-agentId scoping, no-key default. Issue 4A in the #228 review. Note: MCP tools currently don't pass idempotencyKey. Adding support to grove_submit_work / grove_submit_review / etc is a follow-up; existing MCP retry behavior is unchanged from the previous heuristic removal (callers must dedup themselves until then).

Fixes #228 for the plan path: createPlanOperation and updatePlanOperation no longer call store.put() directly. They now sugar over contributeOperation the same way reviewOperation, reproduceOperation, discussOperation, and adoptOperation already do, so plans flow through the canonical write pipeline: - PolicyEnforcer (role-kind constraints) - validateRelations / validateArtifacts - Topology routing event - after_contribute hook - idempotencyKey dedup The literal #228 bug — an agent with allowedKinds=["work"] could create plans freely — is now closed. Routing rules for the plan kind (locked in during the #228 review): kind | handoffs | route event | stop conditions plan | no | yes | no Plans are coordination metadata, not progress. They: - Skip handoff record creation (handoffsRoutedTo = undefined when isPlan) - Still fire the topology routing event (so live UIs can observe creation) - Skip stop-condition evaluation (no full-table scans for chat-equivalent traffic; no broadcastStop) Same skip rules apply to ephemeral messages (kind=discussion + context.ephemeral=true), detected via isEphemeralMessageContext from the new context-schemas module. The rule table for ephemeral messages: kind | handoffs | route event | stop conditions ephemeral msg | no | no | no (Messages skip the route event entirely too — chat shouldn't wake downstream agents.) Implementation (Issues 1A + 13A + 6A landing together because the gating logic is one cohesive change): - contribute.ts: classifies kind/context up front into skipHandoffs / skipRouteEvent / skipStopConditions and threads them through the enforcement, write, and post-write blocks. - policy-enforcer.ts: enforce() takes an options.skipStopConditions flag that bypasses the stop-condition evaluation block (the O(n) scan plus per-root thread() calls). - plan.ts: createPlanOperation and updatePlanOperation become thin sugars over contributeOperation. updatePlanOperation now also validates that previous.kind === 'plan' (Issue 6A), preventing cross-kind derives_from corruption. - Both plan operations expose an idempotencyKey passthrough so MCP callers can opt into retry safety. Issues 1A, 13A, 6A in the #228 review.

…sugar Closes #228 for the messaging path. The previous sendMessage took a raw ContributionStore and a computeCid callback, open-coded the contribution construction, and called store.put() directly — bypassing PolicyEnforcer, TopologyRouter, after_contribute hook, and validateRelations. The new sendMessageAsDiscussion is a thin sugar over contributeOperation that goes through the canonical write pipeline. Behavior changes: - Role-kind constraints now apply (the literal #228 bug for messages). An agent with allowedKinds=['work'] can no longer send messages. - Idempotency keys work for messages (HTTP-style retry safety). - Reply targets are validated via validateRelations — sending a reply to a non-existent CID returns NOT_FOUND instead of writing an orphan. - Returns OperationResult instead of throwing/returning bare Contribution, matching the operation pattern used everywhere else in core/operations. Routing rules for ephemeral messages (set in contributeOperation via isEphemeralMessageContext): kind | handoffs | route event | stop conditions ephemeral msg | no | no | no Chat does not wake downstream agents and does not contribute to stop-condition evaluation. Caller updates (3C in the #228 review): - src/mcp/tools/messaging.ts: grove_send_message uses sendMessageAsDiscussion + the standard toMcpResult adapter. - src/server/routes/boardroom.ts: POST /api/boardroom/message uses the new helper with toOperationDeps adapter, returns 400 on validation errors. - src/cli/commands/inbox.ts: grove inbox send uses the new helper with a small CliDeps→OperationDeps adapter inline. - tests/boardroom/e2e-workflow.test.ts: rewrites the message send steps against the new signature. Test rewrite (10A in the #228 review): - src/core/operations/messaging.test.ts: full rewrite. Adds: - #228 regression test: contract with allowedKinds=['work'] blocks sendMessageAsDiscussion (this test would have failed against the old implementation; now passes). - inReplyTo target validation (NOT_FOUND for missing target) - multi-recipient round-trip - idempotencyKey collapse - empty inbox + non-message discussion exclusion - multi-recipient OR semantics in readInbox - limit cap enforcement 17 tests total covering happy path + 11 edge cases. Issues 3C + 10A in the #228 review.

…serve The MCP entry point (src/mcp/serve.ts) was wiring the raw contribution store directly into deps, bypassing rate limits / clock-skew protection configured in GROVE.md. Compare to src/cli/commands/contribute.ts:353 and src/cli/commands/discuss.ts:127 which already wrap correctly. The bypass meant maxContributionsPerAgentPerHour / maxContributionsPerGrovePerHour / maxArtifactSizeBytes were dead config for any agent talking to grove via MCP — including the agent retry storms and runaway-loop scenarios these limits exist to prevent. This is broader than the literal #228 ('plan/message bypass policy') but in the same family: a contribution operation reaching the store without going through the configured policy layer. Bundled with the #228 PR because fixing #228 without this leaves the 'unified pipeline' claim half-true. Wiring test (Issue 11A): server.integration.test.ts now exercises the wrap end-to-end. It builds a contract with maxContributionsPerAgentPerHour: 1, mirrors serve.ts's wrap logic, and asserts that the second grove_submit_work call from the same agent is rejected with a rate-limit error at the MCP boundary. This test would have failed against the previous serve.ts. Issues 2A + 11A in the #228 review.

Before this commit, plan.ts had ZERO unit tests. The MCP integration test only verified the tool was registered by name. After Issue 1A landed and made plans flow through contributeOperation, the lack of tests was a real risk: a refactor of contributeOperation could silently break plan semantics with no safety net. 19 tests covering: createPlanOperation: - Happy path with stats computation (todo / in_progress / done / blocked) - kind=plan + mode=exploration + 'plan' tag stored correctly - Typed PlanContext via buildPlanContext (plan_title + tasks fields) - User tag preservation - Empty title rejection - Empty task list rejection - Mixed-status stats counting - #228 regression: contract with allowedKinds=['work'] blocks plan creation - idempotencyKey collapses repeated calls updatePlanOperation: - v2 with derives_from relation to v1 - Title falls through from previous when omitted - Explicit title override - NOT_FOUND when previous CID is missing - Issue 6A: rejects update when previous CID is not a plan kind - Empty task list rejection - 3-version derives_from chain with title fallthrough Routing semantics (Issues 1A + 13A): - Plans do NOT generate handoffs even when topology routes the role - Plans bypass stop-condition evaluation (write succeeds even with maxRoundsWithoutImprovement=0 which would normally block) Issue 9A in the #228 review.

Adds 4 routing tests that codify the per-kind side-effect rules locked in during the #228 review. These tests prevent Issues 1A + 13A from becoming silent behavior changes — anyone refactoring the kind-based skip logic in contribute.ts now has explicit failing tests if they violate the contract. Routing rule table: kind | handoffs | route event | stop conditions ----------------|----------|-------------|---------------- plan | no | yes | no ephemeral msg | no | no | no discussion | yes | yes | yes work / review | yes | yes | yes New tests: 1. plan kind fires routing event but creates no handoff - Spy handoff store, verify no create() calls for kind=plan - Verify routing event still received by downstream role subscriber 2. ephemeral message kind skips both routing event AND handoffs - Discussion with context.ephemeral=true: no handoff, no event 3. non-ephemeral discussion routes normally - Discussion without ephemeral flag still creates handoff + event - Confirms no regression for the regular discussOperation path 4. plan does not trigger broadcastStop (Issue 13A) - Pre-populate store + budget=1 stop condition (would normally fire) - Plan write succeeds without triggering broadcastStop because plans skip stop-condition evaluation entirely Issue 12A in the #228 review.

Adds an optional createMany() method to the HandoffStore interface and uses it from the contributeOperation serial write path. NexusHandoffStore implements it as a single VFS file write (one casUpdate, one HTTP round-trip), and create() now delegates to createMany([input]) so the two paths share one code path. Before this change, contributeOperation's serial path looped over each routing target awaiting handoffStore.create() sequentially: for (const targetRole of routedTo) { await handoffStore.create({ ...sourceCid, fromRole, toRole }) } For a contribution routed to N downstream roles via Nexus, that was N×rtt against the handoff store. The atomic SQLite cowrite path didn't have this problem (it writes everything in one transaction), but Nexus is the configured production store in src/mcp/serve.ts. Now writeSerial calls handoffStore.createMany(inputs) when available and falls back to the create() loop otherwise. NexusHandoffStore's createMany() collapses N inserts into one casUpdate cycle. The HandoffStore interface change is backwards-compatible: createMany is optional, so existing implementations (in-memory, SQLite) keep working unchanged. SQLite-backed stores already use the atomic cowrite path so they don't need the batch method. Verified by 374 tests across 23 files (operations, policy-enforcer, lifecycle integration, MCP integration). Issue 15A in the #228 review.

grove_done writes a kind=discussion contribution to mark session termination. Before this fix, that discussion was treated like any other (non-ephemeral) discussion by contributeOperation — topology routing fired and a handoff record was created for the downstream role. Discovered during #228 E2E validation. A real review loop produced: handoffId sourceCid from→to status 9152010f... work coder→reviewer replied ca1ef83d... review reviewer→coder pending_pickup 92cf7e7b... done-discussion reviewer→coder pending_pickup ← wart The ca1ef83d handoff is correct (review feedback routes back to the coder, who may iterate). The 92cf7e7b handoff is semantic nonsense: the session is ending, so there's nothing for the "downstream" agent to pick up. The handoff sits in pending_pickup forever. Fix: one-line change in src/mcp/tools/done.ts — set context.ephemeral=true on the done discussion. This routes it through the same skip path as ephemeral messages in contribute.ts (isEphemeralMessageContext check), suppressing both the handoff creation and the routing event. The [DONE] marker still lands in the DAG as a discussion contribution (so grove_log and the TUI can show it), but it no longer creates coordination records. Related: - Updates isEphemeralMessageContext doc in context-schemas.ts to clarify it now covers two shapes: chat messages and done markers. The function name stays historical. - Adds a targeted routing test in contribute-routing.test.ts that exercises the exact shape done.ts writes, pinning the behavior against future regressions. Bundled into the #228 PR because the wart was discovered during the #228 E2E validation and is closely related to the routing rules the PR establishes.

Codex adversarial review of PR #237 flagged three high-severity issues that the existing test suite wasn't catching. All three are fixed here. Finding 1 — Any caller can mark real work as ephemeral and bypass routing/handoffs/stop checks (contribute.ts) isEphemeralMessageContext() returned true for any context.ephemeral=true regardless of kind. A caller passing { kind: "work", context: { ephemeral: true } } to grove_submit_work would skip the entire side-effect pipeline AND be filtered out by the frontier calculator, making real progress invisible to downstream reviewers, budget, and quorum logic. Fix: reject the combination at the boundary with VALIDATION_ERROR, and gate the skip path on kind === "discussion". The ephemeral flag is now strictly reserved for chat messages and grove_done markers. Tests: - 'ephemeral flag on non-discussion kind is rejected' — iterates over work/review/reproduction/adoption, asserts each is rejected - 'ephemeral flag on discussion is allowed' — positive control Finding 2 — Idempotency cache was not single-flight or fingerprint-validated (contribute.ts) Two flaws: a) Two concurrent retries with the same key could both miss the cache (populated post-write) and create duplicate writes. b) Reusing the same key with different body silently returned the first call's result, hiding client bugs. Fix: rewrite the cache around CachedIdempotencyEntry with { fingerprint, pending?, value? }. Three new helpers: - computeIdempotencyFingerprint: canonical hash of kind/mode/summary/description/artifacts/relations/tags + agentScope (role ?? agentId, matching the cache-key namespace) - lookupIdempotency: returns pending | value | conflict | undefined - reserveIdempotencySlot: synchronously inserts a pending Promise and returns a resolver. Check-then-insert is synchronous (single-threaded JS) so concurrent callers cannot both race past a miss. contributeOperation now: 1. On hit (pending): awaits the in-flight promise 2. On hit (value): returns cached result 3. On fingerprint conflict: returns STATE_CONFLICT 4. On miss: reserves a slot, runs write, resolves on success (cached) or validation-error (cached), releases on thrown error (retries can proceed) Tests updated + added: - 'repeated call with same key + same input returns cached result' - 'same key + different input is rejected with STATE_CONFLICT' - 'concurrent calls with same key are single-flight (one write)' — runs two calls via Promise.all, verifies only one contribution in the store - 'agent role scopes idempotency: identical payload from two coders shares cache' — multi-instance role with identical fingerprint - 'role scope: same key + different summary is STATE_CONFLICT' - messaging.test.ts + plan.test.ts: matching rename + new conflict test Finding 3 — Wrapping MCP store dropped atomic cowrite to best-effort (enforcing-store.ts, contribute.ts) Commit a520eae wrapped the contribution store with EnforcingContributionStore in mcp/serve.ts. But the wrapper didn't forward putWithCowrite, so contributeOperation's capability check couldn't detect the atomic path — local MCP sessions with a contract silently lost transactional contribution+handoff writes. Fix: EnforcingContributionStore gains an async putWithCowrite method that acquires the shared mutex, runs rate-limit enforcement, runs the per-CID preWriteHook, then delegates to inner.putWithCowrite (sync, runs inside the inner SQLite transaction). writeAtomic in contribute.ts now accepts either a sync or async putWithCowrite and awaits the return if it's a Promise. Tests: - 'exposes putWithCowrite when inner store supports it' - 'putWithCowrite runs enforcement then delegates to inner cowrite' — observable cowriteRan flag + stored contribution - 'putWithCowrite still enforces rate limits' — RateLimitError on second write from same agent with max=1 Test results: 371 pass / 0 fail across src/core/operations and enforcing-store. 1281 pass / 2 fail across all of src/core + MCP integration; the 2 failures are pre-existing environmental tests (AcpxRuntime/TmuxRuntime) unrelated to these fixes. Original findings: codex adversarial review on PR #237 — 2026-04-09

Round 2 of adversarial review on PR #237. Three high-severity issues, all fixed with targeted regression tests. Finding 1 — Idempotency fingerprint ignores context, scores, and artifact names (contribute.ts) computeIdempotencyFingerprint() hashed kind/mode/summary/description, but explicitly omitted context (where plans store their task list, messages store body+recipients), scores (per-metric payloads), and artifact NAMES (only hashes). A caller reusing the same key with different plan tasks, different scores, or a renamed artifact would hit the cached success path instead of STATE_CONFLICT — silently acknowledging the second request while the stored contribution still reflects the first. Fix: expand the fingerprint to include context, scores, and name→hash artifact pairs. Add canonicalizeForFingerprint() which deep-sorts object keys recursively so { a: 1, b: 2 } and { b: 2, a: 1 } hash identically (object-key insertion order is not semantic). Regression tests: - 'fingerprint rejects same key + different context (plan tasks)' — two plans with different task statuses, second must reject - 'fingerprint rejects same key + different scores' — metric value change under same key must reject - 'fingerprint rejects same key + renamed artifact (same hash)' — same blob, different filename, must reject - 'fingerprint is insensitive to context key order' — deep key canonicalization test, nested objects Finding 2 — Post-commit failure clears the idempotency slot and allows duplicate writes on retry (contribute.ts) The slot was only resolved at the END of contributeOperation, after all post-write side effects (onContributionWrite callback, persistOutcome, topology event, stop recheck, after_contribute hook). If any of those threw, the outer catch released the slot — even though the contribution was ALREADY durably written. A retry with the same idempotencyKey would then execute a fresh write path with a new createdAt, producing a duplicate contribution + handoff fan-out. Fix: treat the durable contribution write as the idempotent boundary. Build a 'committedResult' immediately after writeContributionWithHandoffs() returns, resolve the slot with it, then CLEAR the local idempotencySlot reference so the outer catch cannot release it. Wrap onContributionWrite/onContributionWritten callbacks and persistOutcome in try/catch — they log warnings but cannot propagate and cannot undo the commit from the caller's perspective. The direct first caller still receives a 'final' result at the end of the function that may include post-write updates (e.g., stop-condition recheck detecting a threshold crossing). Cached retries get the narrower committed-only snapshot, which is correct — the contribution is the same, only the advisory stop signal differs, and cached retries can't observe the updated state anyway. Regression test: - 'post-commit callback failure does NOT release the idempotency slot' — injects a throwing onContributionWritten callback, runs the same request twice with the same key, asserts both return the same cid AND that exactly ONE contribution exists in the store. Finding 3 — grove inbox send still bypasses contract enforcement and topology routing (cli/commands/inbox.ts) The previous fix (PR #237 commit 5535629) routed inbox send through sendMessageAsDiscussion(), but the CLI handler built OperationDeps via cliDepsToOperationDeps() which carried only raw store, claim store, cas, frontier, outcome store — NO contract, NO handoff store, NO topology router. Result: an agent could use `grove inbox send` to create a discussion contribution even when GROVE.md restricted it via agent_constraints.allowed_kinds, bypassing the same check MCP-routed messages now respect. The CLI was the remaining backdoor. Fix: rewrite handleSend to mirror the `grove discuss` bootstrap: load GROVE.md, parse the contract, wrap the contribution store with EnforcingContributionStore when a contract exists, include the handoff store + contract in the OperationDeps passed to sendMessageAsDiscussion. Same pattern as discuss.ts:96-155. Regression tests (new src/cli/commands/inbox.test.ts): - 'succeeds when GROVE.md does not exist' — no contract, message goes through (same policy as discuss) - 'succeeds when contract allows discussion kind' — positive control, verifies the stored contribution - 'rejects when contract restricts allowed_kinds to ['work']' — the literal Codex finding. Runs handleInbox with a GROVE.md that blocks discussions, asserts non-zero exit, stderr contains the role_kind message, and NO discussion contribution landed in the DAG. Test results: 389 pass / 0 fail across src/core/operations, src/cli/commands/inbox.test.ts, src/core/enforcing-store.test.ts, src/mcp/server.integration.test.ts. Original findings: codex adversarial review round 2 on PR #237

Round 3 of adversarial review on PR #237. Two more high-severity issues fixed with regression tests. Finding 1 — grove inbox send silently disables enforcement on malformed GROVE.md (cli/commands/inbox.ts) The previous fix wrapped readFile() and parseGroveContract() in one broad try/catch. A YAML parse error, a schema validation failure, even a permission-denied error were all treated the same as ENOENT → contract stayed undefined → EnforcingContributionStore was never applied → CLI sends silently bypassed allowed_kinds, rate limits, and clock-skew checks. This reopened the exact bypass the round-2 enforcement fix was supposed to close. A broken GROVE.md could make the CLI send path quietly unenforced with no operator-visible signal. Fix: separate readFile from parseGroveContract. Only catch ENOENT (file-does-not-exist → proceed without enforcement, consistent with grove discuss). Any other file error re-throws. Parse errors are completely outside the catch so they propagate to the caller. Regression test: 'fails closed when GROVE.md is malformed (YAML parse error)' writes a GROVE.md with an unclosed bracket and asserts the command throws AND no discussion contribution lands in the DAG. Finding 2 — grove_done no longer emits any completion signal to event-driven clients (core/operations/contribute.ts) Round-2 fix made grove_done write context.ephemeral=true so the done marker would skip handoff creation. But that also put done markers on the 'ephemeral chat' skip path in contributeOperation, suppressing the topology route event. Result: event-bus clients like TUI's useDoneDetection() never observed session completion because it disables polling whenever an EventBus is present and waits exclusively for contribution events on the bus. All agents could call grove_done and the session would stay stuck in 'running' forever. Fix: split the classification so done markers are asymmetric from chat. New routing table: kind | handoffs | route event | stop conditions plan | no | yes | no discussion (done) | no | yes | no discussion (chat) | no | no | no discussion (plain) | yes | yes | yes work / review / etc | yes | yes | yes Implementation: contribute.ts now computes isDoneMarker (kind=discussion + context.done=true) and isEphemeralChat (kind=discussion + isEphemeralMessageContext && !isDoneMarker) separately. skipHandoffs covers both; skipRouteEvent covers only isEphemeralChat. Done markers still publish through the topology router so useDoneDetection advances. Regression tests: - 'grove_done discussion (done=true) skips handoff but fires route event' — writes the exact shape done.ts emits, asserts handoff count = 0 AND route event received with the [DONE] summary. - 'ephemeral chat (no done flag) skips both handoff AND route event' — pins the ephemeral-chat branch explicitly so future refactors of the classification logic can't collapse the two rows back together. Also updated the pre-existing 'grove_done discussion' test that previously asserted no route event — it was wrong, now asserts the correct new behavior. Test results: 391 pass / 0 fail across all touched suites. Original findings: codex adversarial review round 3 on PR #237

windoliver added 15 commits April 9, 2026 00:29

windoliver merged commit 772290d into main Apr 9, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(protocol): unify plan/message write paths through contributeOperation (#228)#237

fix(protocol): unify plan/message write paths through contributeOperation (#228)#237
windoliver merged 15 commits intomainfrom
worktree-optimized-toasting-perlis

windoliver commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

windoliver commented Apr 9, 2026

Summary

What changed

Architecture

Routing rules (locked in during the #228 review)

Code quality + perf

Tests

Test results

Stats

Follow-up issues filed

Migration / breaking changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant